Linguistic Phylogenetic Inference by PAM-like Matrices

نویسندگان

  • Antonella Delmestri
  • Nello Cristianini
چکیده

We apply to the task of linguistic phylogenetic inference a successful cognate identification learning model based on PAM-like matrices. We train our system and we employ the learned parameters for measuring the lexical distance between languages. We estimate phylogenetic trees using distancebased methods on an Indo-European database. Our results reproduce correctly all the established major language groups present in the dataset, are compatible with the Indo-European benchmark tree and include also some of the supported higher-level structures. We review and compare other studies reported in the literature with respect to recognised aspects of Indo-European history.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

String Similarity Measures and PAM-like Matrices for Cognate Identification

We present a new automatic learning system for the identification of cognates, words that derive from a common ancestor and share the same etymological origin. Our approach combines and adapts several techniques developed for biological sequence analysis to the natural language processing environment. We design a linguistic-inspired matrix to align sensibly our training dataset. We introduce a ...

متن کامل

Substitution Matrices and Mutual Information Approaches to Modeling Evolution

Substitution matrices are at the heart of Bioinformatics: sequence alignment, database search, phylogenetic inference, protein family classi cation are based on Blosum, Pam, JTT, mtREV24 and other matrices. These matrices provide means of computing models of evolution and assessing the statistical relationships amongst sequences. This paper reports two results; rst we show how Bayesian and grid...

متن کامل

T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks

UNLABELLED T-REX (tree and reticulogram reconstruction) is an application to reconstruct phylogenetic trees and reticulation networks from distance matrices. The application includes a number of tree fitting methods like NJ, UNJ or ADDTREE which have been very popular in phylogenetic analysis. At the same time, the software comprises several new methods of phylogenetic analysis such as: tree re...

متن کامل

Continuous Space Representations of Linguistic Typology and their Application to Phylogenetic Inference

For phylogenetic inference, linguistic typology is a promising alternative to lexical evidence because it allows us to compare an arbitrary pair of languages. A challenging problem with typology-based phylogenetic inference is that the changes of typological features over time are less intuitive than those of lexical features. In this paper, we work on reconstructing typologically natural ances...

متن کامل

RESEARCH ARTICLES Different Versions of the Dayhoff Rate Matrix

Many phylogenetic inference methods are based on Markov models of sequence evolution. These are usually expressed in terms of a matrix (Q) of instantaneous rates of change but some models of amino acid replacement, most notably the PAM model of Dayhoff and colleagues, were originally published only in terms of time-dependent probability matrices (P(t)). Previously published methods for deriving...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Quantitative Linguistics

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2012